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ABSTRACT 


This  report  describes  a  system  by  which  an  autonomous  land  vehicle  might  improve 
its  estimate  of  its  current  position.  This  system  selects  visible  landmarks  from  a  data¬ 
base  of  knowledge  about  its  environment  and  controls  a  camera’s  direction  and  focal 
length  to  obtain  images  of  these  landmarks.  The  landmarks  are  then  located  in  the  im¬ 
ages  using  a  modified  version  of  the  generalized  Hough  transform  and  their  locations  are 
used  to  triangulate  to  obtain  the  new  estimate  of  vehicle  position  and  position  uncertain- 
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1.  Introduction 


The  research  described  in  this  report  is  part  of  a  larger  project  which  has  as 
its  final  goal  the  demonstration  of  an  autonomous,  visually  guided,  land  vehicle 
[Davi85,  Waxm85].  The  vehicle  will  reach  a  desired  destination  by  planning  and 
following  dynamically  chosen  and  revised  paths.  As  planning  proceeds,  the  vehi¬ 
cle  is  commanded  to  move  from  point  to  point.  However,  errors  in  this  move¬ 
ment  create  a  positional  uncertainty  proportional  to  the  distance  travelled.  It  is 
therefore  desirable  to  have  a  sub-system  which  can  re-calculate  the  current  posi¬ 
tion  and  reduce  the  uncertainty  to  within  acceptable  limits.  A  collection  of  algo¬ 
rithms  for  such  a  system  has  been  designed  and  partially  implemented  in  a 
research  environment.  The  system  uses  the  knowledge  of  the  vehicle’s  approxi¬ 
mate  position  to  visually  locate  known  landmarks.  It  then  triangulates  using  the 
bearings  of  the  known  landmarks  to  acquire  a  new  position  with  a  reduced  uncer¬ 
tainty. 

The  system  is  composed  of  three  modules,  called  the  MATCHER,  the 
FINDER,  and  the  SELECTOR,  that  interact  to  establish  the  vehicle’s  position 
with  a  new  level  of  uncertainty. 

1)  The  MATCHER  locates  likely  positions  for  one  or  more  landmarks  in  an 
image,  and  rates  these  locations  according  to  some  measure  of  confidence. 


2)  The  FINDER  controls  the  pointing  direction  and  focal  length  of  the  cam¬ 
era  to  acquire  specified  images  for  a  set  of  landmarks  and  directs  the  MATCHER 
to  find  possible  locations  for  these  landmarks  in  the  images.  It  then  eliminates 
possible  locations  for  individual  landmarks  which  are  not  consistent  with  the  pos¬ 
sible  locations  found  for  other  landmarks.  The  FINDER  then  evaluates  the 
remaining  possible  locations  to  determine  the  actual  locations  of  the  given  land¬ 
marks. 

3)  The  SELECTOR  identifies  a  set  of  landmarks  whose  recognition  in  images 
of  appropriate  angular  resolution  would  improve  the  position  estimate  of  the 
vehicle  by  the  desired  amount.  It  then  directs  the  FINDER  to  establish  likely 
locations  in  such  images  for  subsets  of  those  landmarks.  With  these  locations, 
the  SELECTOR  then  computes  new  estimates  of  the  vehicle  position  and  posi¬ 
tion  uncertainty  and  directs  the  FINDER,  if  necessary,  to  locate  additional  sub¬ 
sets  of  landmarks. 

We  assume  the  vehicle’s  camera  is  mounted  on  a  computer  controlled  pan 
and  tilt  mechanism  and  has  a  computer  adjustable  focal  length.  We  also  assume 
estimates  are  available  for  the  heading  of  the  vehicle,  as  well  as  the  current  set¬ 
tings  of  the  pan,  tilt,  and  focal  length  of  the  camera.  A  database  of  landmarks 
exists  that  includes  all  pertinent  landmark  qualities,  such  as  size  and  position, 
and  at  least  one  representation  of  each  landmark  from  which  it  could  be  recog¬ 
nized  in  an  image. 

Chapter  2,  3,  and  4  describe  the  MATCHER,  FINDER,  and  SELECTOR, 
repectively.  In  Chapter  5,  we  describe  an  implementation  of  the  algorithms. 


Related  literature  is  discussed  in  Chapter  6. 
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2.  The  MATCHER 


2.1.  Overview 

A  generalized  Hough  transform  is  employed  to  locate  landmarks  of  known 
image  orientation  and  scale.  The  landmarks  are  represented  by  lists  of  boundary 
points  which  are  individually  matched  to  edge  points  in  the  image.  The  algo¬ 
rithm  consists  of  three  main  phases:  edge  point  detection,  matching  of  the  tem¬ 
plate  to  the  edge  points,  and  interpreting  the  results  of  the  matching.  Edges  are 
detected  as  points  where  the  Laplacian  changes  sign  and  the  local  grey  levels 
have  a  high  symmetric  difference.  Matching  is  done  using  the  generalized  Hough 
transform  and  is  restricted  in  two  ways.  First,  template  points  match  only  points 
having  close  to  the  same  gradient  direction.  Second,  only  those  template  points 
are  used  whose  gradient  directions  have  a  high  measure  of  informativeness;  this 
measure  is  defined  in  Section  2.3.3.  Throughout  this  chapter,  the  description  of 
the  algorithm  will  be  supplemented  by  references  to  the  two  examples  in  Figures 
2.1  and  2.2. 

2.2.  Getting  the  edge  image 

The  image  given  is  a  grey-level  picture  (Figures  2.1a  and  2.2a).  It  is  first 
smoothed  using  a  local  edge-preserving  smoothing  operator.  This  operator,  the 
symmetric  nearest-neighbor  algorithm,  can  be  computed  on  any  size  neighbor- 


hood,  but  is  simplest  to  describe  for  a  3x3.  A  brief  description  of  the  3x3  algo¬ 
rithm  is  provided  below;  for  a  full  explanation  see  Harwood,  Subbarao,  and  Davis 
[Harw84]. 

For  each  pixel  p  in  the  image,  consider  a  3x3  neighborhood.  For  each  pair 
of  symmetrically  opposed  neighbors,  choose  the  neighbor  whose  grey  level  is 
nearest  in  value  to  the  center  pixel’s.  In  the  case  of  a  3x3,  there  will  be  four 
pairs  and  thus  four  points  chosen.  Replace  p  with  the  mean  of  these  points. 
One  could  also  use  the  median  of  the  four  points,  which  results  in  better  preser¬ 
vation  of  corners;  however  it  is  slightly  slower  than  using  the  mean  and  often  the 
improvement  is  not  enough  to  warrant  the  extra  time. 

The  implementation  used  in  this  thesis  is  a  5x5  version  of  this  algorithm 
with  the  mean  computed  for  the  nearest  neigbors.  Also,  since  the  algorithm’s 
results  improve  with  iteration,  two  iterations  are  done  on  the  image  (Figures  2.1b 
and  2.2b). 

A  neighborhood  size  for  the  Laplacian,  which  is  appropriate  for  the  size  of 
the  object  being  sought,  is  then  selected  and  the  Laplacian  is  convolved  with  the 
smoothed  image.  At  present,  this  selection  is  done  manually  and  usually  the  size 
is  very  small,  such  as  3x5  or  3x7.  However,  the  selection  could  be  done  automat¬ 
ically  using  such  criteria  as  image  size  of  the  object  or  density  of  edge  points,  or 
average  local  standard  deviation  of  edge  direction. 

At  any  point  where  the  Laplacian  crosses  zero  (a  positive  pixel  with  a  nega¬ 
tive  neighbor),  the  local  symmetric  contrast  is  computed.  This  is  done  by  taking 


the  maximum  absolute  difference  in  grey  level  of  any  two  symmetrically  opposite 
points  in  a  3x3  neighborhood.  This  is  a  fast  and  isotropically  smooth  edge 
strength  measure  and  is  used  to  eliminate  all  the  false  or  weak  edge  points  given 
by  the  zero-crossing  of  the  Laplacian.  Using  this  measure  of  edge  strength,  the 
weakest  75  percent  of  the  zero-crossing  points  are  eliminated.  The  result  is  an 
image  of  thin  contours  (due  to  the  zero-crossing  operator)  whose  edge  strength  is 
significant  (Figures  2.1c-e  and  2.2c-e).  Many  of  the  contours  are  often  broken  by 
single  pixels  of  low  contrast,  but  this  does  not  affect  the  matching  procedure 
which  matches  patterns  of  individual  points  and  not  patterns  of  extended  con¬ 
tours. 


2.3.  Matching 

A  generalized  Hough  transform  (GHT),  incorporating  the  gradient  direction 
at  points,  is  used  to  perform  the  matching.  The  GHT  and  the  specific  implemen¬ 
tation  used  are  described  in  Section  2.3.1.  Certain  assumptions  made  regarding 
the  orientation  and  scale  of  the  object  in  the  image  are  explained  in  Section  2.3.2. 
Finally,  a  discussion  of  the  gradient  direction  informativeness  measure  introduced 
in  Section  2.3.1  is  given  in  Section  2.3.3. 

2.3.1.  The  generalized  Hough  transform 

The  generalized  Hough  transform  is  a  fast  point  pattern  matching  algorithm 
that  can  be  used  to  detect  arbitrary  specific  shapes  in  images  (see  [Ball81]  and 
[Davi82]).  The  general  problem  solved  is  to  find  the  function  that  best 
transforms  a  set  of  object  points  (i.e.,  the  shape)  into  a  set  of  image  points.  This 
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function  will  be  in  terms  of  the  parameters  allowed  to  vary  in  the  transformation. 
The  result  of  the  generalized  Hough  transform  is  a  “Hough  space”  which  indi¬ 
cates  the  likelihood  of  any  particular  transformation  being  the  correct  one  for  the 
given  object  and  image.  In  our  application,  we  are  only  concerned  with  finding  a 
transformation  function  that  performs  translation  in  a  plane.  All  other  transfor¬ 
mation  parameters  (such  as  image  scale  and  orientation)  are  assumed  to  be  fixed 
and  known. 

In  our  implementation,  the  object  points  are  the  boundaries  of  a  landmark 
being  viewed  from  an  (approximately)  known  direction  and  distance.  The  image 
points  are  those  found  using  the  edge  finding  algorithm  described  in  the  previous 
section.  The  gradient  direction  is  calculated  at  both  object  points  and  image 
points.  Figures  2. If  and  2.2 f  show  the  edges  used  in  the  image  with  the  gradient 
directions  represented  by  grey  levels.  Figures  2.1g  and  2.2g  show  the  edges  and 
directions  of  the  object  boundaries.  The  object  points  are  organized  into  an 
“ofTset  table”  indexed  on  the  gradient  direction  with  a  set  of  (x,y)  pairs  for  each 
gradient  direction.  The  (x,y)  pairs  indicate  the  offsets  to  some  arbitrary  reference 
point.  We  have  used  the  centroid  of  the  boundary  points  as  the  reference  point. 
For  n  gradient  directions,  the  table  is  of  the  form 


Each  edge  point  in  the  image  with  gradient  direction  4>,  may  correspond  to 
one  or  several  object  points  with  gradient  direction  0,.  For  each  possible 
correspondence,  there  will  be  a  potential  reference  point.  The  positions  of  the 
reference  points,  relative  to  the  edge  point,  are  given  by  the  (x,y)  pairs  for  <j> ,  in 
the  offset  table.  Therefore,  at  each  edge  point  e  in  the  image  with  gradient  direc¬ 
tion  <t>, ,  the  set  of  possible  reference  points  can  be  obtained  by  adding  each  offset 
for  gradient  direction  <t>,  to  the  position  of  e.  The  generalized  Hough  transform 
operates  by  incrementing,  in  an  accumulator  array,  all  the  possible  reference 
points  for  each  edge  point  in  the  image.  The  accumulator  array  is  the  Hough 
space  mentioned  above. 

In  the  general  case,  the  maxima  in  the  array  A  represent  the  most  likely  parame¬ 
ter  values  which  would  characterize  the  best  transformation  functions  from  the 
object  to  the  image.  In  our  case,  the  maxima  in  A  represent  possible  locations  for 
the  object  in  the  image. 

2.3.2.  Orientation  and  scale 

.As  mentioned  above,  we  assume  we  know  the  orientation  and  scale  of  the 
object  in  the  image  within  a  given  tolerance.  The  template  can  therefore  be 
scaled  and  rotated  to  match  the  appearance  of  the  object  in  the  image.  This 
eliminates  the  need  for  the  generalized  Hough  transform  to  include  scale  and 
orientation  parameters;  however,  the  implementation  used  should  allow  for  errors 
in  both  parameters.  This  section  describes  how  our  implementation  allows  for 


small  errors. 


Orientation  errors  effect  the  algorithm  through  mismatches  of  gradient  direc¬ 
tion  between  image  edge  points  and  boundary  points  in  the  template.  A 
mismatch  can  occur  because  of  errors  in  the  measurement  of  the  g  adient  direc¬ 
tion,  noise  near  the  edge  point  in  the  image,  actual  local  differences  between  the 
object’s  silhouette  in  the  image  and  the  template,  or  a  grossly  inaccurate  assump¬ 
tion  of  the  orientation  of  the  object  in  the  image. 

Two  measures  have  been  taken  in  an  effort  to  increase  the  likelihood  of  the 
correct  reference  point  being  incremented  for  each  of  the  above  cases.  First, 
when  the  gradient  direction  is  calculated,  it  is  rounded  to  the  nearest  10  degrees. 
Second,  during  the  matching  process,  instead  of  matching  points  only  when  their 
gradient  directions  are  equal,  edge  points  with  gradient  directions  within  ±15 
degrees  of  a  template  point’s  gradient  direction  are  also  matched. 

Small  errors  in  scale  effect  the  algorithm  by  incrementing  points  in  the 
Hough  space  which  fall  just  short  or  just  long  of  the  actual  reference  point.  This 
will  create  a  faint  inverse  silhouette  of  the  object  in  the  Hough  space  whose  size 
indicates  the  magnitude  of  the  scaling  error.  When  the  error  is  very  small  (one  or 
two  pixels),  the  inverse  silhouette  is  just  a  small  diffuse  dot.  Post-processing  of 
the  Hough  space  can  therefore  find  the  center  of  the  dot  by  local  averaging. 

2.3.3.  Gradient  direction  informativeness 

It  can  often  occur  that  one  or  several  gradient  directions  are  so  prevalent  in 
the  image  that  they  produce  strong  voting  clusters  in  Hough  space  at  incorrect 
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locations.  If  a  gradient  direction  occurs  at  N  edge  points  in  the  image  and  at  M 
template  boundary  points,  then  MxN  increments  are  made  for  that  gradient 
direction  in  the  Hough  space.  Since  usually  only  a  small  fraction  of  the  edge 
points  in  the  image  are  part  of  the  object’s  boundary,  the  remainder  of  the 
matches  can  potentially  contribute  to  false  peaks  in  the  Hough  space. 

Also,  if  a  gradient  direction  is  prevalent  in  the  template,  then,  as  a  group, 
points  with  that  gradient  direction  will  contribute  more  correct  votes  than  would 
points  with  an  infrequent  gradient  direction.  This  is,  of  course,  because  there  will 
be  more  boundary  points  with  the  prevalent  gradient  direction  incrementing 
potential  reference  points.  Therefore,  when  the  reference  point  happens  to  be  the 
correct  location  for  the  object,  more  of  the  votes  contributing  to  its  peak  will 
come  from  points  with  the  prevalent  gradient  direction  than  from  points  with  an 
infrequent  gradient  direction. 

To  use  these  observations  to  best  advantage,  a  measure  of  gradient  direction 
informativeness  (GDI)  was  developed  to  rate  the  gradient  directions.  Then,  only 
those  points  whose  gradient  directions  rate  highly  are  used  in  the  matching.  In 
this  way,  we  can  eliminate  the  uninformative  sources  of  spurious  patterns  in  the 
Hough  space  and  make  best  use  of  the  most  informative  points.  The  measure 

pf  f 

used  is  — — j4r  where  P|  G  |t  is  the  probability  that  gradient  direction  G  occurs  in 

the  template  and  Pj  G  |t  is  the  probability  that  gradient  direction  G  occurs  in  the 
image.  The  actual  probabilities  are  extracted  from  histograms  of  the  template 
and  the  image.  Based  on  this  measure,  only  the  most  informative  15  percent  of 
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the  edge  points  in  the  image  are  used  in  the  matching.  Consequently,  boundary 
points  in  the  template  whose  gradient  directions  are  not  selected  will  not  be  used. 

It  can  be  seen  that  gradient  directions  that  occur  often  in  the  template  but 
infrequently  in  the  image  would  rate  very  high  on  this  scale.  Also,  gradient 
directions  with  few  occurences  in  the  template  but  many  in  the  image  would  rate 
very  low.  Points  with  such  gradient  directions  would  yield  a  high  number  of 
unrelated  votes,  cluttering  the  Hough  space  and  creating  false  peaks. 

This  measure  has  proved  very  successful  in  several  tests.  Figures  2.3  and  2.4 
show  the  Hough  space  for  two  pictures  with  varying  degrees  of  filtering  using  the 
gradient  informativeness  measure.  It  is  clearly  most  useful  when  a  few  gradient 
directions,  which  are  not  essential  to  locating  the  object,  dominate  the  image. 

Figure  2.5  shows  the  object  points  actually  used  in  the  matching  for  our  two 
examples.  In  each  case,  the  most  informative  15  percent  were  used. 

2.4.  Finding  the  peaks  in  the  Hough  space 

The  matching  algorithm  described  in  Section  2.3  produces  a  two-dimensional 
Hough  space  the  same  size  as  the  image  being  searched  (Figs.  2.1h  and  2.2h). 
The  local  intensity  peaks  in  this  image  represent  possible  locations  for  the  object 
in  the  image.  To  avoid  making  false  conclusions  when  near  ties  occur,  've  pro¬ 
duce  a  list  of  the  possible  peaks  with  their  respective  confidences.  This  allows 
the  decision  about  which  peak  represents  the  actual  location  to  be  passed  to 
higher  level  decision-making  systems.  The  process  of  producing  the  list  from  the 
Hough  space  can  be  thought  of  as  several  passes  of  simple  neighborhood 


operators: 


(1)  Sum  the  votes  in  a  KxK  neighborhood. 

(2)  Perform  non-maximum  suppression  on  a  JxJ  neighborhood,  i.e.,  eliminate  all 
points  having  a  neighbor  in  a  JxJ  neighborhood  with  a  higher  sum. 

(3)  Compute  confidences  of  the  remaining  points. 

This  process  could  be  quite  inefficient  if  it  were  computed  on  array- 
formatted  pictures  of  any  significant  size;  therefore,  some  reasonable  limits  were 
imposed  on  the  numbers  and  values  of  points  that  were  of  importance  at  each 
step  in  the  process.  The  resulting  algorithm,  using  these  limits  and  lists  of  sorted 
points,  is  as  follows. 

(1)  The  N  points  having  highest  vote  counts  are  selected  and  sorted  into  a  list 
on  vote  count.  (N  is  selected  to  ensure  not  eliminating  the  correct  object 
location.  Fifty  has  proved  to  be  a  sufficient  number  for  all  cases  tried.) 

(2)  All  points  whose  vote  count  is  below  M  percent  of  the  highest  value  are 
eliminated. 

(3)  For  the  remaining  points,  the  vote  count  is  replaced  by  a  sum  of  the  vote 
counts  in  a  K  x  K  neighborhood  and  a  new  list  sorted  on  this  value. 

(4)  In  this  new  list,  those  points  are  again  eliminated  which  are  below  M  percent 
of  the  highest  in  the  list. 

(5)  Non-maximum  suppression  is  used  to  identify  the  local  maxima.  Starting  at 
the  low  end  of  the  sorted  list  of  summed  points,  each  point,  call  it  s,  is  com¬ 
pared  to  each  of  the  points  below  it.  If  a  point,  t,  below  point  s  is  in  a  JxJ 
neighborhood  centered  on  s,  then  eliminate  t.  The  algorithm  must  start  at 
the  bottom  of  the  sorted  list  so  that  points  are  not  eliminated  before  they 
have  a  chance  of  eliminating  others. 

A  measure  of  confidence  is  now  calculated  for  each  remaining  point.  It  is 
designed  to  indicate  both  the  strength  of  the  peak  and  how  it  relates  to  other 
surviving  peaks.  If  it  is  necessary  to  compare  confidence  measures  for  peaks  in 
several  images  of  different  size  and  edge  content,  then  this  confidence  measure 
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should  be  normalized. 


If  Vt  is  the  number  of  votes  for  location  x  (summed  over  a  K  xK  neighbor¬ 
hood)  and  there  are  n  possible  positions,  then  the  confidence  measure  is  com¬ 
puted  as  follows: 

vt 

ct  =  — - X100  . 

SK 

i«=0 

In  words,  the  confidence  measure  Ct  is  the  number  of  votes  that  location  x 
received  (summed  over  a  K  xK  neighborhood  centered  on  x)  expressed  as  a  per¬ 
centage  of  the  total  votes  given  to  all  points  (summed  over  K  xK  neighborhoods) 
in  the  list  of  possible  positions. 

Figures  2. li-j  and  2.2i-j  show  the  most  confident  peak  found  using  this  algo¬ 
rithm.  The  template  boundary  points  are  overlayed  on  both  the  Hough  space 
and  original  image. 

2.5.  Suggestions  for  improvements 

Certain  simplifying  assumptions  are  made  in  the  MATCHER.  .Although 
some  are  reasonable  for  the  intended  application,  in  general  the  system  is  less 
robust  because  of  them.  One  such  assumption,  that  the  scale  and  orientation  of 
the  landmark  in  the  image  are  known,  can  be  eliminated  by  including  the  scale 
and  image  orientation  of  the  landmark  as  parameters  in  the  Hough  transform. 
Although  this  would  result  in  a  four  dimensional  Hough  space,  the  range  of 
values  in  each  dimension  could  be  limited  and  therefore  make  computation  time 
reasonable. 


-  13- 


3.  The  FINDER 


3.1.  Overview 

This  chapter  describes  a  strategy  (the  FINDER)  for  determining  bearings  to 
a  given  set  of  landmarks.  The  FINDER  is  also  given  specifications  for  images  in 
which  it  can  expect  to  find  these  landmarks.  It  then  controls  the  camera  to 
obtain  these  images  and  uses  the  MATCHER  to  establish  likely  locations  for  the 
landmarks  in  their  respective  images.  Since  the  search  for  any  specific  landmark 
may  result  in  several  possible  locations  for  that  landmark  (at  most  one  of  which, 
of  course,  can  be  correct),  we  employ  a  simple  geometric  constraint  propagation 
algorithm  to  eliminate  many  of  the  false  locations. 

The  geometric  constraint  propagation  algorithm  considers  possible  locations 
for  a  pair  of  landmarks  and  determines  if  they  could  both  be  the  correct  location 
for  their  respective  landmarks.  Two  possible  locations  (or,  more  briefly,  peaks) 
are  called  consistent  if  they  meet  this  criterion.  The  details  of  this  consistency 
computation  are  described  below.  With  consistency  determined  for  all  pairs  of 
peaks,  a  graph  is  then  constructed  in  which  nodes  are  peaks,  and  arcs  represent 
the  mutual  consistency  between  two  peaks.  Analysis  of  this  graph  can  determine 
consistency  among  groups  of  more  than  two  peaks  and  therefore  eliminate  peaks 
based  on  more  than  just  pairwise  inconsistency. 
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3.2.  Consistency  between  peaks 

To  determine  consistency  between  two  peaks  px  and  p2  for  landmarks  Lx  and 
L2,  we  first  calculate  a  range  of  possible  angular  differences  between  Lx  and  L2 
based  on  the  vehicle’s  position  uncertainty.  We  then  extend  tub  range  by  the 
pointing  error  and  check  that  the  measured  angular  difference  between  p,  and  p2 
falls  within  this  range.1  See  Figure  3.1. 

The  angular  difference  between  L !  and  L  2  is  determined  by  simply  taking  the 
difference  of  their  bearings.  The  range  of  angular  differences  is  then  obtained  by 
letting  the  value  for  the  current  position  vary  according  to  the  position  uncer¬ 
tainty  of  the  vehicle.  For  the  purposes  of  this  analysis,  we  assume  that  the  posi¬ 
tion  uncertainty  can  be  represented  by  a  solid  disc  on  the  local  ground  plane.  If 
we  make  the  reasonable  assumption  that  neither  landmark  lies  inside  the  disc, 
then  it  is  easy  to  show  that  the  positions  which  give  the  maximum  and  minimum 
angular  differences  will  always  lie  on  the  circumference  of  the  disc.  To  prove 
this,  we  give  the  following  informal  argument. 

Assume  the  contrary,  that  p  is  a  point  inside  the  disk  for  which  the  angle 
L,pL2  is  maximum.  Consider  a  line  which  bisects  this  angle.  Since  p  is  inside  the 
disc,  there  must  exist  a  point  p'  on  that  line  which  is  also  inside  the  disk  and  is 
closer  to  both  L,  and  L2.  See  Figure  3.2.  Clearly  the  angle  Ljp'  L2  would  be 
greater  than  LjpL2.  This  contradicts  our  assumption.2  A  similar  argument  can  be 
applied  if  L,pL2  is  assumed  to  be  a  minimum. 

1  We  could  also  account  for  the  error  doe  to  pixel  sise,  bat  do  aot  since  it  is  negligible. 

*  Note  that  this  does  not  imply  that  the  solution  points  will  be  at  the  intersection  of  the  bisector  and  the  circumfer¬ 
ence  of  the  disk.  We  could  find  no  such  simple  intuitive  solution  to  this  problem. 


From  these  positions  we  can  then  calculate  directly  the  maximum  and 
minimum  angular  difference  between  Lx  and  L*  An  abbreviated  derivation  of  an 
analytic  solution  for  these  points  follows. 

We  define  the  positions  of  the  two  landmarks  I,  and  L2  to  be  (zj.y,).,,  and 
{z2,y2 )v»  maP  coordinates.  The  current  vehicle  is  at  (x0,y0)u,  and  the  disk  of 
uncertainty  is  centered  on  this  location  and  has  radius  r.  We  then  transform  all 
locations  from  map  coordinates  into  a  coordinate  system  centered  on  the  vehicle 
position.  All  coordinates  from  now  on  will  be  in  this  new  coordinate  system. 

Two  lines  with  slopes  mi  and  m2  meet  at  an  angle  ip  (measured  from  line  1  to 
line  2  counter-clockwise)  given  by 


If 


1/1-1/ 

m .  —  -  ,  m  2 

Xi-Z 


Vt-U 

Ijr-I 


ip  =  atari  ( 


(*.!/)  is 


m2  —  mt  V 
1  +  m^j 

the  vehicle’s 


actual 


position, 


(1) 

and 


(*i,j/j)  and  (t 0.1/2)  are  the  locations  of  two  landmarks,  then  equation  (1)  would 
determine  the  angular  difference  between  the  two  landmarks. 

We  cau  find  the  extrema  of  v  by  differentiating  equation  (1)  and  setting  it 
equal  to  zero.  Since  (x,y)  is  constrained  to  lie  on  the  disk’s  circumference,  we  can 
represent  (z,y)  as  (rcos0,rsin0).  Note  that  there  is  only  one  variable,  0,  since  r  is  a 


constant. 


This  makes 


y  !~r  sin0 
xx-r  cos0 


and  m2 


yrr  sin0 
xrr  cos 0 


Differentiating  equation 


(I)  and  simplifying,  we  obtain 


da  .  _  dt  da  _  it  da  it 

d  i>  1  id  8  dd  _  1  dt  9  it  _  id'1  *' dS 

dS  ,  .  »2  '  t 2  <2  +  *2  '  t2  t2  +  s2 

t2  t 2 

with 


(2) 


»  -rco»  d(y2  ~  Vi)  +  *i»2  -  Z2»1  -  ^(*i  -  *2) 

<  -rsin  0(y2  +  V 1)  +  V1U2  +  *1*2  -  r  co&0(zi  +  x2)  +  r2 


Setting  equation  (2)  to  zero,  we  obtain 

0  =  -^r  t  -  a  —r  =  -N  -  Main  9  -  Peos  9  (3) 

dd  dd  v  7 

where 


N  =  r  (y22  -  V!2  +  2  2  -  2 12)  , 

M  =  -yty2  +  V  id  2  +  *202  +  r2{yl-  y2)  -  x2yi  ,  and 
P  =  ii(  -y2  -  x2  )  +  z2y2  +  zfx2  +  r2^  -  z2)  . 

Since  coed  =  sin#  =  and  y  =  \/r2  -  z2,  we  can  rearrange  equation  (3)  to  get 
the  quadratic 

0  =*  (P2  +  M^z2  +  2NMrx  +  (N2  -  P2)r2.  (4) 


At  the  solution  points  to  (4),  we  calculate  the  angle  between  the  two  indivi¬ 
dual  landmark  points  and  subtract  using  the  equation 


y2  -  y 

=  02  -  4>i  —  atari  (  -  )  -  atari  ( 


z,  -  x 


Calculating  <t>  for  each  solution,  we  get 


-  )  =  atari  (m2)  -  atari  (ntj). 

X  j  —  X 

the  maximum  and  minimum  angular 


differences  between  the  two  landmarks  L,  and  l2  given  a  position  uncertainty  of 


±r. 
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At  this  point  we  extend  this  range  to  allow  for  the  pointing  error  of  the 
pan/tilt  mechanism.  This  is  done  by  simply  adding  the  error  to  the  range. 

3.3.  Consistency  Graph  Anaiysb 

As  mentioned  above,  the  consistency  graph  represents  consistency  relations 
between  the  possible  locations  for  different  landmarks.  Ideally,  we  would  want  to 
determine  the  maximal  complete  subgraphs  (MCS’s)  of  this  graph  because  they 
would  represent  the  largest  sets  of  landmark  locations  that  are  all  mutually  con¬ 
sistent.  For  small  graphs  this  is  not  impractical,  but  for  large  graphs  we  might 
be  forced,  due  to  time  constraints,  to  perform  a  simpler  analysis. 

We  can,  for  example,  apply  certain  simple  iterative  tests  to  the  graph  that 
would  eliminate  any  landmark  location  not  part  of  at  least  a  k-clique.  In  what 
follows,  we  identify  two  simple  tests  for  eliminating  nodes  not  part  of  k-cliques. 
These  processes  are  similar  to  so-called  ‘‘discrete  relaxation”  algorithms  -  see, 
e.g.,  Haralick  and  Shapiro  [Hara79|. 

First,  we  can  iteratively  eliminate  all  nodes  which  do  not  have  arcs  to  nodes 
representing  at  least  k  other  distinct  landmarks.  After  this  process  is  complete, 
we  can  then  eliminate  all  nodes  which  are  not  the  center  of  what  we  refer  to  as  a 
lc-fan.  A  node  n  is  the  center  of  a  k-fan  if  there  exists  a  connected  chain  of 
nodes  of  distinct  landmarks  of  length  k-1  in  which  each  element  of  the  chain  is 
connected  to  n.  Figure  3.4  contains  an  example  of  applying  both  node  deletion 
processes  to  a  graph  of  landmark  locations.  Finally,  we  find  all  MCSs  for  this 
pruned  graph. 
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Since  we  could  end  up  with  several  MCSs,  we  now  need  a  way  to  determine 
which  is  the  actual  set  of  landmark  locations.  To  do  this,  we  define  an  evalua¬ 
tion  function  to  operate  on  the  MCSs  and  then  pick  the  MCS  which  responds 
best  to  the  evaluation  function.  In  our  current  system,  we  use  a  simple  summa¬ 
tion  of  the  confidences  for  each  of  the  possible  locations. 
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4.  The  SELECTOR 

4.1.  Overview 

This  chapter  describes  a  strategy  (the  SELECTOR)  for  selecting  a  set  of 
landmarks  whose  identification  in  appropriate  images  would  improve  the  current 
estimate  of  the  vehicle’s  position.  The  SELECTOR  supplies  subsets  of  these 
landmarks,  with  appropriate  image  specifications,  to  the  FINDER  which  returns 
the  most  likely  relative  positions  for  each  landmark  in  each  subset.  The  SELEC¬ 
TOR  then  computes  the  vehicle’s  actual  location  and  the  new  uncertainty  associ¬ 
ated  with  it.  If  this  new  uncertainty  is  insufficient,  then  the  SELECTOR  can 
either  simply  accept  the  new  uncertainty  as  the  best  achievable  result,  or  try  to 
further  improve  the  position  estimate  using  other  landmarks. 

Given  a  database  of  visual  landmarks,  a  variety  of  strategies  can  be 
employed  to  select  a  subset  of  those  landmarks  for  identification.  The  implemen¬ 
tation  of  any  of  these  strategies  requires  the  ability  to  determine  both  the  ease  of 
identification  of  any  given  landmark  and  the  effect  of  its  identification  on  the 
vehicle’s  position  uncertainty.  The  development  of  these  abilities  is  described  ir 
Sections  4.2  and  4.3.  Further  discussion  of  the  SELECTOR  module  continues  in 
Section  4.4. 
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4.2.  Determining  ease  of  identification 

Many  factors  effect  the  ease  of  identification  of  a  landmark.  Some  examples 
are  the  size  of  the  landmark  in  the  image  being  matched,  the  stability  and 
geometric  complexity  of  the  landmark’s  model  from  the  current  vantage  point, 
and  the  position  of  the  sun.  In  a  more  sophisticated  system,  we  may  have  infor¬ 
mation  about  the  visual  surroundings  of  the  landmark  and  be  able  to  consider 
the  landmark’s  relative  uniqueness  in  the  image  as  a  factor.  In  this  section  we 
consider  two  factors:  1)  t’(e  ability  to  obtain  an  image  that  will  allow  us  to  accu¬ 
rately  locate  a  particular  landmark,  and  2)  the  suitability  of  the  landmark’s 
model  for  use  with  the  MATCHER. 

4.2.1.  Suitable  image  verification 

The  two  quantities  that  determine  an  image  are  direction  and  focal  length 
(or  field  of  view).  The  direction  is  constrained  by  the  fact  that  extremely  bright 
scenes  (i.e.,  those  containing  light  sources)  will  most  likely  saturate  the  camera 
and  result  in  a  pure  white  or  washed  out  image.  It  would  clearly  be  futile  to 
search  for  a  landmark  in  such  an  image.  The  directions  in  which  these  scenes 
might  occur  could  be  predicted  by  using  special  instruments  or  by  analyzing 
failures  of  the  MATCHER.  To  determine  the  camera  direction,  we  simply  use 
the  bearing  of  the  landmark  with  respect  to  the  current  presumed  vehicle  posi¬ 
tion.  This  can  be  calculated  straightforwardly  from  the  coordinates  of  the  land¬ 
mark  and  vehicle.  We  can  then  verify  that  the  camera  does  not  point  towards  a 
“bright  scene”. 


There  at  least  four  constraints  on  the  appropriate  focal  length  for  a  suitable 
image. 

(1)  The  focal  length  must  be  short  enough  for  the  image  to  contain  the  land¬ 
mark  after  accounting  for  all  the  errors  in  our  estimation  of  its  position. 

(2)  The  focal  length  must  be  long  enough  to  insure  that  locating  the  landmark 
would  improve  the  position  estimate. 

(3)  The  focal  length  must  be  long  enough  to  guarantee  that  the  landmark  will 
appear  in  the  image  with  a  size  (in  pixels}  large  enough  for  it  to  be  reliably 
located. 

(4)  The  above  three  constraints  must  be  satisfied  by  at  least  one  focal  length 
within  the  available  range  of  focal  lengths  for  our  imaging  system. 

The  first  three  constraints  are  discussed  in  the  next  three  sections.  The  last 
constraint  and  the  verification  process  are  addressed  in  the  following  section. 

4.2. 1.1.  Determining  the  minimum  field  of  view 

In  order  to  insure  that  we  obtain  an  image  large  enough  to  contain  a  partic¬ 
ular  landmark,  we  need  to  know  the  physical  size  of  the  landmark,  its  position, 
the  vehicle’s  current  position,  the  vehicle’s  current  position  uncertainty,  and  the 
pointing  error  of  the  camera  control  system.  These  parameters  determine  the  field 
of  view  (fov)  necessary  to  include  the  landmark. 

To  determine  the  minimum  field  of  view  needed,  we  use  a  method  illustrated 
in  Figure  4.1.  The  factors  considered  are  the  size  of  the  landmark  and  our  ability 
to  point  the  camera  at  the  landmark.  Our  ability  to  point  the  camera  in  the 
correct  direction  depends  on  how  well  the  landmark’s  bearing  can  be  approxi¬ 
mated  from  our  current  position,  and  how  precisely  the  camera  can  be  pointed  in 


-  83 


that  direction. 


Our  approximation  of  a  landmark’s  bearing  can  only  be  as  accurate  as  our 
approximation  of  the  current  vehicle  location.  If  a  set  of  halflines,  emanating 
from  a  landmark,  are  extended  through  all  the  possible  vehicle  locations,  then 
they  form  a  wedge  with  an  angular  width  6.  This  angle  6  is  the  amount  of  angu¬ 
lar  uncertainty  with  which  either  the  landmark  or  the  vehicle  could  locate  each 
other. 

Now,  a  landmark  L  with  width  W  will  subtend  an  angle  <t>  in  the  field  of 
view  which  depends  on  the  distance  D  to  the  landmark.  Since  the  focal  length 
will  be  very  small  compared  to  the  distance,  D ,  to  the  landmark,  we  can  approxi¬ 
mate  D  -  focal  length  ,  the  distance  from  the  landmark  to  the  center  of  focus,  by 

w 

D  .  Therefore,  L  subtends  approximately  <t>  —  a>ctan(— ). 

The  camera  control  mechanisms  will  certainly  have  an  inherent  angular 
uncertainty  error  (pointing  error)  which  we  will  denote  by  \p.  The  same  will  be 
true  of  the  heading  feedback  equipment  and  we  will  denote  this  orientation 
uncertainty  by  a. 

We  can  now  simply  add  these  four  angular  uncertainties  together  to  arrive 
at  a  total  field  of  view  which,  if  centered  on  our  best  approximation  of  the  land¬ 
marks  location,  will  be  guaranteed  to  contain  the  landmark.  Therefore,  the 
minimum  fov  for  L  is, 

minimum-FOV  =  fl  +  <>  +  +  a  . 

For  a  camera  with  a  digitizing  surface  of  size  fa  (in  millimeters),  the  minimum-FOV 
can  be  obtained  by  using  a  focal  length  max-fl  given  by 
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max-8 


_ fj_ _ 

tan(minimum-FOV) 


4.2. 1.2.  Satisfying  the  accuracy  requirement 

The  objective  of  the  SELECTOR  is  to  reduce  the  vehicle’s  position  uncer¬ 
tainty  to  a  certain  amount.  Since  the  new  uncertainty  of  the  vehicle  location  is 
determined  by  the  bearings  for  landmarks  of  known  positions,  the  accuracy  of 
those  bearings  determine  the  accuracy  of  the  new  vehicle  position.  We  now 
describe  how  to  express  this  accuracy  requirement  as  a  minimum  focal  length  for 
the  image  in  which  we  identify  that  landmark. 

The  achieved  angular  accuracy  is  determined  by  the  pointing  error  of  the 
pan/tilt  mechanism  ,  the  error  in  orientation  estimation  and  the  width  of  a  pixel 
in  the  image  where  the  landmark  is  located.  A  pixel  will  subtend  a  certain  angle 
in  the  field  of  view  (called  the  pixel  angl£  ),  that  depends  on  the  focal  length  of 
the  lens,  the  width  of  the  camera’s  imaging  surface,  the  spatial  resolution 
(number  of  pixels  across)  of  the  camera,  and  the  position  of  the  pixel  in  the 
image.  Since  the  position  of  the  landmark  in  the  image  can  not  be  known  at  this 
point,  we  approximate  it  by  the  center  pixel  of  the  image.3 4  Using  this  approxima¬ 
tion,  we  have  pixel  angle  =  arctan(- — — — SIT* — ■ — : — ).  The  pointing  error,  i\ 

focai  length  •  resolution 

and  orientation  error,  a,  are  expressed  as  degrees  in  the  pan  and  tilt  direction. 
These  errors  in  the  pan  direction  are  then  added  to  the  pixel  angle  to  get  the 

3  This  ia  also  known  as  the  instantaneous  field  of  view  or  IFOV,  since  the  pixel  is  the  result  of  one  instantaneous 
sampling  of  the  scene  by  the  imaging  device 

4  This  is  not  a  critical  assumption  since  even  in  the  largest  held  of  view  the  pixel  angle  would  only  vary  by  a  very 
small  amount  across  the  entire  image 


total  angular  error.  That  is, 


total  angular  error  =  p  +  a  +  arctanl- — — — £1?  s'ze — — . — ).  (g) 

focal  length  •  resolution  '  ’ 

To  calculate  the  focal  length  needed  to  achieve  the  required  accuracy,  we 
simply  solve  for  focal  length  in  equation  (6),  as  follows: 

...  ..  film  size 

minimum  focal  length  =  - - — — - ; - — - - - ■ - . 

tan(  minimum  angular  error  -  (V'  +  a))  -  resolution 

With  this  minimum  focal  length  established,  we  can  then  know  when  the  bearing 
of  a  landmark,  found  in  a  particular  image,  meets  the  accuracy  specifications 
given  for  that  landmark. 

4.2. 1.3.  Determining  minimum  size 

We  still  need  to  insure  that  the  landmark  we  are  searching  for  will  appear  in 
this  initial  image  with  a  size  that  maintains  at  least  some  of  its  unique  identify¬ 
ing  features.  The  minimum  spatial  resolution  (number  of  pixels  across  the  land¬ 
mark),  min-pixels,  needed  to  insure  this  is  determined  a  priori  for  each  landmark. 
The  minimum  focal  length,  min-fl,  b  then  given  by 

a  film  size 

rmn-fl  =  - - — ; - ; — - : — ■ - — . 

tan  (min  -pixels  •  pixel  angle) 

4.2. 1.4.  Constraint  checking 

To  vbualize  the  restrictions  that  these  constraints  put  on  our  choice  of  focal 
length,  refer  to  Figure  4.2  where  they  are  dbplayed  on  a  focal  length  axis.  The 
thick  part  of  the  line  axb  represents  the  range  of  available  focal  lengths.  Two  of 
the  constraints  specify  ranges  bounded  by  minimum  values  and  one  specifies  a 
range  bounded  by  a  maximum  value.  If  these  ranges  overlap  in  a  region  that  has 
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some  part  within  the  range  of  available  focal  lengths,  then  a  suitable  image  is 
obtainable.  In  fact,  the  most  desirable  image,  in  light  of  these  constraints,  is 
most  likely  the  one  with  the  longest  focal  length  in  this  acceptable  region. 

If  the  limiting  factor  for  our  choice  of  focal  length  is  the  maximum  available 
focal  length,  then  we  can  simply  use  a  window  from  an  image  with  a  shorter  focal 
length  as  long  as  this  shorter  foca!  length  obeys  the  other  constraints.  This,  in 
fact  was  done  in  the  implementation  of  Chapter  5.  Note  that  in  a  more  sophisti¬ 
cated  system,  if  the  limiting  factor  is  the  minimum  available  focal  length  (i.e.,  the 
acceptable  range  is  entirely  below  it),  then  several  higher  resolution  images  with 
smaller  fields  of  view  could  be  used  to  systematically  cover  the  field  of  view  that 
the  original  image  would  have  covered. 

4.2.2.  Landmark  model/MATCHER  suitability 

Recall  that  the  MATCHER  selects  points  to  match  based  on  gradient  direc¬ 
tion.  Points  are  matched  only  if  their  gradient  direction  appears  frequently  in 
the  model  compared  to  its  occurance  in  the  image.  Without  future  knowledge  of 
the  image,  we  can  only  assume  that  a  model  with  a  uniform  distribution  of  gra¬ 
dient  direction  would  be  best.  This  is  because  if  the  model  has  a  frequent  gra¬ 
dient  direction  that  also  occurs  frequently  in  the  image,  then  this  gradient  direc¬ 
tion  will  probably  not  be  matched,  causing  a  disproportionately  large  number  of 
model  points  not  to  be  matched.  Therefore,  we  can  use,  for  example,  the  stan¬ 
dard  deviation  of  the  gradient  direction  counts  as  a  measure  of  how  uneven  the 
distribution  is  for  the  model.  This  can  then  be  compared  to  other  landmarks  to 
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help  determine  the  best  choice. 


4.3.  Determining  the  effect  of  identification 

This  section  describes  how  to  determine  the  extent  to  which  finding  a  partic¬ 
ular  landmark’s  bearing  would  effect  the  vehicle’s  position  uncertainty.  To  do 
this,  we  first  explain  how  that  landmark's  bearing  would  be  used  (with  other 
landmarks’  bearings)  to  help  determine  the  new  vehicle  location.  Then,  we  show 
how  to  obtain  an  estimate  of  the  uncertainty  for  this  new  vehicle  location. 

Given  a  pair  of  bearings  (Bj,52)  for  two  landmarks  with  known  positions 
(zutii)  and  (x2,y2),  we  can  find  the  actual  vehicle  location  by  intersecting  the  lines 
passing  through  (*!,»,)  with  angle  B ^  and  (x2,y 2)  with  angle  B2.  See  Figure  4.3.  If 
the  bearing  B,  to  landmark  L,  is  only  known  to  within  ±9, ,  then  the  possible 
lines  passing  through  (x,  ,y, )  would  sweep  out  a  wedge  W,  of  angular  width  29,  on 
the  ground  plane.  See  Figure  4.4a.  Since  for  each  landmark,  L, ,  found  the  vehi¬ 
cle  is  constrained  to  lie  in  the  planar  wedge  W, ,  then  the  vehicle  must  lie  in  the 
convex  polygon  formed  by  the  intersection  of  these  wedges.  See  Figure  4.4b. 

The  size  and  shape  of  this  convex  polygon  is  determined  by  the  width  of 
each  wedge  at  their  intersection  and  the  angles  at  which  they  intersect.  The 
width  U,  of  a  wedge  IV,  at  a  distance  d,  from  L,  is  given  by  U,  —  2  d,  -tan 0, ,  where 
±9,  is  the  uncertainty  of  the  landmark  bearing  as  calculated  in  Equation  (G). 
Therefore,  the  effect  of  finding  L,  ’s  bearing  on  the  vehicle  location  uncertainty  is 
proportional  to  the  angular  uncertainty  9 ,  of  the  bearing  and  the  distance  from 
L,  to  the  actual  vehicle  location.  Since  the  actual  vehicle  location  is  not  known  at 
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this  point,  we  approximate  it  by  the  assumed  current  position. 

Now,  since  the  vehicle  is  known  to  be  within  a  certain  distance  r  of  the 
assumed  current  position,  we  can  initially  constrain  the  vehicle  to  lie  within  a 
convex  region  centered  on  the  current  position.  For  the  purposes  of  this  analysis, 
we  shall  assume  that  this  region  is  a  square  of  dimension  2r.  Therefore,  initially 
the  effect  of  finding  landmark  L,  is  that  of  finding  the  intersection  of  the  wedge 
Wt  and  the  square  region  of  uncertainty.  We  will  now  discuss  briefly  a  simple 
method  for  finding  this  intersection  and  evaluating  its  size. 

A  natural  representation  for  the  wedges  and  the  initial  uncertainty  region  is 
the  intersection  of  halfplanes,  primarily  because  the  wedge  is  unbounded  in  one 
direction  and  our  initial  uncertainty  region  is  convex.  The  initial  uncertainty 
region  is  represented  by  the  intersection  of  four  halfplanes.  Then  we  add  to  this 
set  the  two  half  planes  which  represent  the  first  wedge.  We  now  have  the  inter¬ 
section  of  six  halfplanes  defining  a  new  convex  polygon5  (see  Figure  4.5a).  As 
wedges  representing  the  subsequent  landmarks  are  added  to  the  set  of  halfplanes, 
the  convex  polygon  (resulting  from  the  intersection  of  the  halfplanes  ia  the  set) 
will  get  smaller  and  smaller  (see  Figure  4.5b). 

To  express  in  one  parameter  the  uncertainty  represented  by  a  convex 
polygon,  we  find  the  two  vertices  which  are  furthest  apart.  Half  of  the  distance 
between  these  two  vertices  is  a  reasonable  approximation  of  the  “radius”  of  this 
polygon.  This  “radius”  can  be  compared  to  the  original  r  to  determine  to  what 
extent  the  vehicle’s  position  uncertainty  has  been  reduced. 

*  The  intereection  of  any  Bomber  of  halfplaoe*  ii  a  convex  polygoi. 
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4.4.  The  SELECTOR  continues 


By  maximizing  both  the  ease  of  detection  and  the  effect  of  identification  as 
discussed  in  the  previous  sections,  the  SELECTOR  arrives  at  a  set  of  landmarks. 
During  this  process,  it  calculates  a  direction  and  optimal  focal  length  for  each 
landmark  that  specify  an  image  in  which  that  landmark  can  be  identified.  The 
SELECTOR  then  directs  the  FINDER  to  find  the  subsets  of  landmarks  in  their 
respective  images.  (One  might  search  for  only  a  subset  to  limit  the  amount  of 
effort  devoted  to  a  potentially  fruitless  search;  if  some  critical  subset  cannot  be 
identified,  a  completely  new  set  of  landmarks  could  be  chosen.) 

The  FINDER  returns  its  best  estimate  of  the  locations  of  those  landmarks 
along  with  the  bearings  calculated  for  them.  From  these  bearings  and  the  loca¬ 
tions  of  the  landmarks,  the  SELECTOR  then  computes  new  estimates  of  actual 
location  and  uncertainty.  These  calculations  were  described  in  Section  4.3.  If  all 
the  landmarks  were  found  as  expected,  then  this  new  uncertainty  will  meet  the 
initial  uncertainty  requirement.  However,  if  only  a  subset  of  the  landmarks 
selected  for  the  set  were  found,  they  could,  in  some  circumstances,  determine  an 
acceptable  uncertainty.  If  they  do  not,  then  the  SELECTOR  can  choose  another 
combination  of  landmarks  for  analysis. 


6.  An  Implementation 

This  chapter  describes  a  partial  implementation  of  the  system  described  in 
the  first  four  chapters  in  an  indoor  environment.  The  FINDER  and  MATCHER 
are  completely  implemented  to  the  extent  possible  with  the  available  equipment. 
The  SELECTOR  is  partially  implemented.  The  parts  not  implemented  on  the 
computer  were  done  by  hand  and  results  supplied  to  the  system  when  needed. 

5.1.  Experimental  Environment 

The  environment  was  a  terminal  room  adequately  lit  from  above  by  several 
rows  of  fluorescent  lights.  The  camera  used  was  a  Cohu  2800  with  a  manually 
adjustable  zoom  lens  having  a  range  of  20mm  to  80mm.  The  camera  was 
mounted  on  a  large  tripod  with  manual  adjustments  for  pan,  tilt,  and  spin.  Pro¬ 
gramming  was  done  in  both  C  and  Franz  Lisp  on  a  VAX  11/780  running  under 
the  UNIX  operating  system  (BSD  4.2).  For  information  on  the  interface  between 
Franz  Lisp  and  C  see  [Andr84]. 

5.2.  Programming 

Top  level  control  of  both  the  SELECTOR  and  FINDER  is  done  with  the 
YAPS  production  rule  system  for  flexibility  and  extensibility.  YAPS  integrates 
naturally  with  Franz  Lisp,  as  the  right  hand  side  of  YAPS  rules  tan  be  arbitrary 
sequences  of  lisp  expressions.  Lisp  functions  comprise  most  of  the  FINDER  and 
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SELECTOR  and  are  called  from  the  right  hand  side  of  simple  YAPS  rules  or 
from  other  lisp  functions. 

The  database  is  comprised  of  YAPS  facts  and  knowledge  stored  as  lisp  data 
structures.  The  YAPS  facts  are  used  for  high-level  decision  making  and  the 
firing  of  YAPS  rules.  The  lisp  data  structures  are  generally  symbols  having  pro¬ 
perty  lists  containing  global  information  for  the  system.  For  instance,  all  the 
landmark  information  is  stored  in  this  manner. 

The  MATCHER  is  in  two  parts,  the  Hough  transform  and  the  Hough  space 
analysis.  Both  were  written  completely  in  C,  but  are  used  as  lisp  functions.  This 
is  the  case  with  many  other  functions  which  might  have  required  a  large  amount 
of  numerical  computation,  such  as  the  calculations  for  the  minimum  and  max¬ 
imum  angular  difference  between  landmarks  (in  Section  3.1.1). 

5.3.  Operation  of  the  system 

As  mentioned  above,  only  parts  of  the  SELECTOR  module  are  currently 
implemented.  These  parts  are  the  computation  of  a  minimum  image  to  include  a 
particular  landmark  and  the  triangulation  to  arrive  at  the  new  position.  There¬ 
fore,  in  this  example  the  author  functioned  as  the  SELECTOR  for  many  tasks. 
What  follows  is  a  simplified  step  by  step  description  of  the  operation  of  the  sys¬ 
tem  (divided  into  sections  labeled  with  the  module  currently  in  control). 

5.3.1.  The  SELECTOR  selects 

The  room  was  first  examined  for  interesting  landmarks  which  could  be  easily 
located  and  seemed  disparate  enough  to  result  in  a  small  uncertainty  for  the  new 
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position.  It  was  decided  that  three  landmarks  would  be  a  good  number  for 
demonstrating  the  system.  They  were  a  wall  phone,  a  coffee  cup  taped  to  a 
blackboard,  and  a  wall  outlet  with  a  cord  plugged  into  it.  The  positions  and 
sizes  of  these  landmarks  were  measured  precisely  and  entered  into  the  database. 
The  layout  of  the  landmarks  in  the  room  is  shown  in  Figure  5.1.  The  existance 
of  suitable  images  for  these  landmarks  was  then  verified  followed  by  the  determi¬ 
nation  of  those  suitable  images  with  the  longest  focal  lengths.  They  were  as  fol¬ 
lows: 


linage  spec  locations 

Landmark 

Bearing  of 
imaire  center 

Focal  length  (mml 

FOV  (decrees) 

phone 

58 

54 

10 

cup 

98 

98 

6 

139 

114 

5  1 

Templates  for  eventual  use  by  the  MATCHER  were  then  created  for  these 
landmarks.  This  was  done  by  obtaining  high  resolution  images  in  which  the 
landmarks  were  distinct  from  the  surroundings.  These  images  were  then  reduced 
to  the  size  they  would  appear  at  from  the  vehicle’s  assumed  current  position. 
The  gradient  direction  was  then  calculated  at  strong  edge  points  to  obtain  a  set 
of  object  points  suitable  for  the  MATCHER  (see  Chapter  2). 

The  landmarks  chosen  were  then  specified  to  the  FINDER  along  with  the 
corresponding  images  specified  above. 
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5.3.2.  The  FINDER  obeys 


The  FINDER  then  requested  that  the  camera  be  positioned  to  obtain  each  of 
the  three  images  designated.  These  images  are  shown  in  Figure  5.2.  After  each 
one  was  obtained,  it  called  the  MATCHER  routines  to  find  likely  locations  for 
the  corresponding  landmark  in  the  image.  The  MATCHER  used  a  Laplacian  of 
size  3x7,  the  top  25  percentile  of  the  edge  points,  the  top  15  percentile  of  those 
based  on  gradient  direction  informativeness,  and  matched  image  points  whose 
gradient  directions  were  within  15  degrees  of  an  object  point’s  gradient  direction. 
This  resulted  in  the  following: 


Poss 

ible  Image  locations 

Phone 

Cap 

Ping 

cup2:  (60.66)  38 

plug3:  (30,26)  19 

Figure  5.3  shows  these  possible  locations  on  the  original  images.  These  three  lists 
of  likely  locations,  with  their  respective  confidences,  were  then  checked  against 
each  other  for  angular  consistency  as  described  in  Chapter  3.  The  resulting  con¬ 
sistency  graph  is  shown  in  Figure  5.4a.  The  graph  was  then  pruned  resulting  in 
the  removal  of  3  out  of  the  8  nodes.  Figure  5.4b  shows  the  graph  after  pruning 
and  Figure  5.5  shows  the  remaining  possible  locations  overlaid  on  the  images. 
The  maximal  complete  subgraphs  were  three  3-cliques  as  follows: 


_ Maximal  comp  lets  subgraphs _ 

1)  (  phonel(lOO)  cupl(83)  plugl(25)  ) 

2)  (  phonel(lOO)  cupl(63)  plug4(18)  ) 

3)  (  phonel(lCO)  cupl(63)  plug5(l7)  ) 


These  maximal  complete  subgraphs  were  then  evaluated  using  the  confidences  to 
arrive  at  the  best,  which  was  the  third  one  above.  In  Figure  5.6,  the  template 
points  that  were  actually  used  in  the  matching  are  shown  overlaid  on  their 
respective  original  images  at  the  final  locations.  The  bearings  were  then  calcu¬ 
lated  for  these  image  locations  and  the  values  passed  back  to  the  SELECTOR. 

5.3.3.  The  SELECTOR  triangulates 

The  SELECTOR  then  simply  triangulated,  using  the  landmark  bearings 
with  their  known  locations,  to  get  the  new  position.  The  new  uncertainty  was 
then  calculated  (as  described  in  Chapter  4).  These  results  are  displayed  graphi¬ 
cally  in  Figure  5.7. 
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0.  Related  Literature 


Ballard  introduced  the  generalization  of  the  Hough  transform  [Ball81j  and 
presented  the  parameterization  used  in  this  thesis  in  Ballard  and  Brown  [Ball82]. 
Davis  presented  extensions  to  the  generalized  Hough  transform  for  matching 
hierarchically  organized  point  patterns  or  patterns  of  line-segments  [Davi82]. 
Hakalahti,  Harwood  and  Davis  [Haka83]  constrained  matching  based  on  local  pro¬ 
perties  of  contour  points  and  Davis,  Kitchen,  Hu  and  Hwang  [Davi83]  used  the 
generalized  ilough  transform  to  match  patterns  of  blobs  and  ribbons. 

The  generalized  Hough  transform  can  also  be  used  for  recognizing  three- 
dimensional  objects  in  images.  If  our  landmarks  were  represented  as  3D  models, 
and  only  very  poor  estimates  were  available  of  our  position,  then  such  3D  match¬ 
ing  would  be  of  interest.  Silberberg  [Silb84a,SilbS4b]  considered  the  special  case 
where  only  position  and  rotation  on  the  ground  plane  are  unknown.  This  would 
be  a  good  representation  for  our  problem.  See  also  Ballard  and  Sabah  [Bali33], 
and  Stockman  &  Esteva  [Stoc84]. 

Research  on  autonomous  vehicles  has  led  to  several  papers  describing  other 
methods  of  automatic  position  determination.  Fukui  [FukuSl]  presented  a 
method  using  a  specific  diamond-shaped  landmark  and  using  distortion  of  the 
shape  to  determine  the  vehicle’s  relative  position.  This  process  was  extended  in 
Courtney  and  Aggrawal  [Cour83].  Related  papers  under  the  general  category  of 


camera  calibration  are  Hung,  Yeh  &  Mansbach  [Hung83]  and  Hung,  Yeh  &  Har¬ 
wood  [Hung84j.  Methods  using  acoustic  or  laser  ranging  sensors  are  described  in 
Crowley  [Crow83]  and  Jarvis  [Jarv83],  respectively. 


-37- 


References 


[AJle82al  -  E.  M.  .'Mien,  R.  Trigg,  and  R.  Wood,  Maryland  Artificial  Intelligence 
Group  Franz  Lisp  Environment,  CS-TR-1266,  Computer  Science  Department, 
University  of  Maryland  (1082). 


(Alle82b)  -  E.  M.  Allen,  YAPS:  Yet  Another  Production  System,  CS-TR-1MG, 
Computer  Science  Department,  University  of  Maryland  (1982). 


[Andr84]  -  F.  P.  Andresen,  The  Franz  Lisp  -  C  Interface,  CAR-TR-1411,  Com¬ 
puter  Vision  Laboratory,  University  of  Maryland  (1984). 


[Ball8l]  -  D.  H.  Ballard,  Generalizing  the  Hough  transform  to  detect  arbitrary 
shapes,  Pattern  Recognition  IS,  111-222  (1981). 


[Ball82j  -  D.  H.  Ballard  and  C.  M.  Brown,  Computer  Vision,  Prentice-Hall,  Eagle- 
wood  Cliffs,  New  Jersey,  482  (1982). 


[Bails.!]  -  D.  H.  Ballard  and  D.  Sabbah,  Viewer  independent  shape  recognition, 
IEEE  Trans,  on  Pattern  Analysis  and  Machine  Intelligence  5,  653-660  (1983). 


[Cour83]  -  J.  W.  Courtney  and  J.  K.  Aggarwal,  Robot  guidance  using  computer 
vision,  Proceedings  of  the  IEEE  71,  57-62  (1983). 


[C’row83]  -  J.  L.  Crowley,  Dynamic  world  modeling  and  position  estimation  for  an 
intelligent  mobile  robot,  7th  International  Conference  on  Pattern  Recognition, 
Montreal,  Canada  (1981). 


[Davi82]  -  L.  S.  Davis,  Hierarchical  generalized  Hough  transforms  end  line- 
segment  based  generalized  Hough  transforms,  Pattern  Recognition  15,  277-285 
(1982). 


[Davi83]  -  L.  S.  Davis,  Image  matching  using  generalized  Hough  transforms, 
CAR-TR-27,  Computer  Vision  Laboratory,  University  of  Maryland  (1983). 


(Davi85j  -  L.  S.  Davis,  Visual  algorithms  for  autonomous  navigation,  submitted  to 
IEEE  International  Conference  on  Robotics  and  Automation  (1085). 


[Fuku81]  -  I.  Fukui,  TV  image  processing  to  determine  the  position  of  a  robot 
vehicle,  Pattern  Recognition  14,  101-109  (1081). 


[Hara79]  -  R.  Haralick  and  L.  Shapiro,  The  consistent  labeling  problem,  IEEE 
Trans,  on  Pattern  Analysis  and  Machine  Intelligence  1,  17-3-183  (1979). 


[Harw84j  -  D.  Harwood,  M.  Subbarao,  H.  Hakalahti,  and  L.  S.  Davis,  A  new  class 
of  edge-preserving  smoothing  filters,  CAR-TR-59,  Computer  Vision  Laboratory, 
University  of  Maryland  (1984). 


[HungS3]  -  Y.  Hung,  P.  Yeh,  and  P.  Mansbach,  Calibration  of  a  structured  light 
vision  system,  CAR-TR-29,  Computer  Vision  Laboratory,  University  of  l  taryland 
(1983). 


[IIung84]  -  Y.  Hung,  P.  Yeh,  and  D.  Harwood,  Passive  ranging  to  known  planar 
point  sets,  CAR-TR-65,  Computer  Vision  Laboratory,  University  of  Maryland 
(1984). 


[Jarv83]  -  R.  A.  Jarvis,  A  laser  time-of-Sight  range  scanner  for  robotic  vision. 
IEEE  Trans,  on  Pattern  Analysis  and  Machine  Intelligence  5,  505-512  (1983). 


[Mora83]  -  H.  P.  Moravec,  The  Stanford  Cart  and  the  CMU  Rover,  Proceedings 
of  the  IEEE  71,  872-884  (1983). 


(3ilb84aj  -  T.  M.  Silberberg,  D.  Harwood,  and  L.  S.  Davis,  Object  recognition 
using  oriented  model  points,  First  Conference  on  Artificial  Intelligence  Applica¬ 
tions,  Denver,  Colorado  (1984). 


[Silb84b]  -  T.  M.  Silberberg,  L.  S.  Davis,  and  D.  Harwood.  An  iterative  Hough 


procedure  for  three-dimensional  object  recognition,  Pattern  Recognition,  to 
appear. 


[Stoc84]  -  G.  Stockman  and  J.  C.  Esteva,  Use  of  geometrical  constraints  and  clus¬ 
tering  to  determine  3D  object  pose,  Proceedings:  ICPR-7,  Montreal,  Canada, 
742-744  (1984). 


[Waxm85]  -  A.  Waxman,  J.  LeMoigne,  and  B.  Srinivasan,  Visual  navigation  of 
roadways,  submitted  to  IEEE  International  Conference  on  Robotics  and  Automa¬ 
tion  (1985). 


40- 


c)  Zero-crossings  of  the  Laplacian  of  a  Gaussian  d)  Symmetric  contrast  operator  applied 


Figure  2.1  Follows  the  MATCHER  through  each  stage  of  image  processing  for 
locating  a  tape  dispenser  near  a  coffee  maker 


i)  The  olution  overlaid  on  the  Hough  space  j)  The  solution  overlaid  on  the  original  image 


Figure  2.1  (continued) 
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a)  The  original  image  b)  Symmetric  K-nearest  neighbor  smoothing 


ej  Zero- crossings  of  the  Laplacian  of  a  Gaussian  d)  Symmetric  contrast  operator  applied 


e)  Edge  points  picture 


f)  Gradient  directions  at  edge  points 


Figure  2.2 


The  same  as  Figure  2.1  except  for  locating  a  brush 


Figure  2.2  (continued) 


a)  Using  100^  of  the  gradient  directions  b)  Using  50' V  of  the  gradient  directions 


c)  Using  30f  7  of  the  gradient  directions  <*)  Gsing  1  >rc  of  the  gradient  directions 


Figure  2.3  Results  of  gradient  direction  informativeness  (GDI)  filtering  for 
the  tape  dispenser 


a)  Using  100ro  of  the  gradient  directions  b)  Using  50rc  of  the  gradient  directions 


c)  Using  30f e  of  the  gradient  directions  d)  Using  15 pc  of  the  gradient  directions 


Figure  2.4  Same  as  Figure  2.3  except  done  for  the  brush 


a)  The  tape  template 


b)  The  brush  template 


Figure  2.5  The  templates  after  GDI  filtering 
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a)  Determining  the  range  of  angular  differences 


b)  The  angular  difference  between  peaks 
Figure  3.1  Consistency  between  peaks 
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a)  Original  graph 


c)  After  two  iterations  of  k-star; 
bold  arcs  are  after  three  iterations 


Figure  3.4  Graph  pruning  with  k-stars  and  k-fans 
O:  will  be  deleted  in  the  next  iteration 
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Figure  4.1  Determining  mmimum-FOV 
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Figure  4.2  Constraint  checking 
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b)  cup 


c)  plug 


Figure  5.5  Results  of  pruning  shown  on  images 
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Figure  5.7  Results  showing  new  position  and  uncertainty 
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